Data point selection for self-training

نویسنده

  • Ines Rehbein
چکیده

Problems for parsing morphologically rich languages are, amongst others, caused by the higher variability in structure due to less rigid word order constraints and by the higher number of different lexical forms. Both properties can result in sparse data problems for statistical parsing. We present a simple approach for addressing these issues. Our approach makes use of self-training on instances selected with regard to their similarity to the annotated data. Our similarity measure is based on the perplexity of part-of-speech trigrams of new instances measured against the annotated training data. Preliminary results show that our method outperforms a self-training setting where instances are simply selected by order of occurrence in the corpus and argue that selftraining is a cheap and effective method for improving parsing accuracy for morphologically rich languages.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Negative Selection Based Data Classification with Flexible Boundaries

One of the most important artificial immune algorithms is negative selection algorithm, which is an anomaly detection and pattern recognition technique; however, recent research has shown the successful application of this algorithm in data classification. Most of the negative selection methods consider deterministic boundaries to distinguish between self and non-self-spaces. In this paper, two...

متن کامل

A Boundary-aware Negative Selection Algorithm

Negative selection algorithms generate their detector sets based on the points of self data. In the approach described in this paper, the continuous self region is defined by the collection of self data. This has important differences from the negative selection algorithms that simply take each self point and its vicinity as the self region: when the training self points are used together as a ...

متن کامل

Power-Expected-Posterior Priors for Variable Selection in Gaussian Linear Models

Imaginary training samples are often used in Bayesian statistics to develop prior distributions, with appealing interpretations, for use in model comparison. Expected-posterior priors are defined via imaginary training samples coming from a common underlying predictive distribution m, using an initial baseline prior distribution. These priors can have subjective and also default Bayesian implem...

متن کامل

Correlation between self-concept and academic achievement of students

  Introduction: Nowadays the most important problem of our educational system is educational subsidence phenomenon. Therefore knowing factors which improves education and prevents is of special importance. Self-concept is among factors studied a lot. Many studies indicate a direct relationship between self-concept and educational subsidence but some experts doubt the direct relationship of...

متن کامل

Comparison of the Effects of Self-Determination Skills Training and Parent Management Training on Externalizing Behavior Problems of Students

This study was carried out to compare the effects of self-determination skills training and parent management training on the externalizing behavior problems of students. This quasi-experimental research had a pretest-posttest, control group design. To achieve research goals, 45 students with externalizing behavior problems who were identified through Child Behavior Checklist (CBCL) and via ran...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011